Linking Textual Resources to Support Information Discovery
نویسنده
چکیده
A vast amount of information is today stored in the form of textual documents, many of which are available online. These documents come from different sources and are of different types. They include newspaper articles, books, corporate reports, encyclopedia entries and research papers. At a semantic level, these documents contain knowledge, which was created by explicitly connecting information and expressing it in the form of a natural language. However, a significant amount of knowledge is not explicitly stated in a single document, yet can be derived or discovered by researching, i.e. accessing, comparing, contrasting and analysing, information from multiple documents. Carrying out this work using traditional search interfaces is tedious due to information overload and the difficulty of formulating queries that would help us to discover information we are not aware of. In order to support this exploratory process, we need to be able to effectively navigate between related pieces of information across documents. While information can be connected using manually curated cross-document links, this approach not only does not scale, but cannot systematically assist us in the discovery of sometimes non-obvious (hidden) relationships. Consequently, there is a need for automatic approaches to link discovery. This work studies how people link content, investigates the properties of different link types, presents new methods for automatic link discovery and designs a system in which link discovery is applied on a collection of millions of documents to improve access to public knowledge.
منابع مشابه
Knowledge Extraction for Information Retrieval
Document retrieval is the task of returning relevant textual resources for a given user query. In this paper, we investigate whether the semantic analysis of the query and the documents, obtained exploiting state-of-the-art Natural Language Processing techniques (e.g., Entity Linking, Frame Detection) and Semantic Web resources (e.g., YAGO, DBpedia), can improve the performances of the traditio...
متن کاملBuilding the Multi-layer Theory of Association Semantic based on the Power-law Distribution of Linking Keywords
Web information contain plentiful, significant knowledge which is eager to be explored by users. Effective semantic layered technology not only can provide theoretical support for knowledge discovery in Web resources, but also can improve the searching efficiency of the related information system. This paper builds the multi-layer theory of association semantic based on the power-law distributi...
متن کاملChapter 17. LinkOut: Linking to External Resources from Entrez Databases
LinkOut is a powerful linking feature of the Entrez search and retrieval system (Chapter 15). It is designed to provide Entrez users with links from database records to a wide variety of relevant online resources, including full-text publications, biological databases, consumer health information, and research tools. (See Sample Links for examples of LinkOut resources.) The goal of LinkOut is t...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملRunning head: Measuring Emergence in Information Discovery An Experimental Method for Measuring the Emergence of New Ideas in Information Discovery
While sometimes the task that motivates searching, browsing, and collecting information resources is finding a particular fact, humans often use information resources in intellectual and creative tasks that can include comparison, understanding, and discovery. Information discovery tasks involve not only finding relevant information, but also seeing relationships among collected information res...
متن کامل